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On Ongoing Exchanges 

A sign of maturity in a scholarly discipline is the presence of ongoing debates, or, in really 
mature disciplines, titanic feuds (Gore Vidal and William Buckley Jr., Norman Mailer and Noam 
Chomsky, or closer to home B.F. Skinner and Chomsky). A small and hardly titanic instance of 
this form of exchange has taken shape in one corner of applied linguistics and was inadvertently 
launched by a 2007 paper of mine (“Computing the vocabulary demands of L2 reading”). It dealt 
with a question almost everyone in reading and vocabulary has had something to say about, 
whether reading alone can build an adequate reading lexicon in a second language, or whether 
some sort of extra vocabulary training will normally be required. Like the best debating topics, 
this one is riddled with definitional issues (What is adequate?) that fonn some portion of the 
excitement. My paper was a simple corpus investigation with rudimentary tools of the reading- 
alone question, not the in-principle version but just the in-fact one of why typical second 
language (L2) lexicons seem to plateau soon after about 2,500 or 3,000 word families, leaving 
readers with only 90% lexical coverage in the academic texts they must read, which is somewhat 
short of the 95% to 98% coverage research by Nation (2006) and others which have shown to be 
needed for comprehension. 


The Original Finding 

I proposed that one good answer to this question (in the case of adult academic ESL learners 
with one year to prepare for content study in English) was that such learners typically do not 
seek or receive vocabulary training, and so are implicitly volunteering for a ‘reading alone’ 
experiment willy-nilly, and that if you count the number of times they are likely to meet words at 
the 4,000 frequency level and beyond, it will typically not be enough for reliable learning of 
those words to occur. For example, conspiracy, a typical fourth-thousand word (by Nation’s, 
2012, BNC-COCA frequency scheme) will occur in any of its possible forms just 24 times in a 
non-specialist corpus of 1 million words of general US English. One million words is not an 
unusual annual diet for first language (LI) school learners reading self-selected texts, but L2 
learners with 2,500 known word families are unlikely to get through more than a quarter of that 
in a year’s worth of reading, leaving conspiracy and its cohort with roughly six occurrences and 
hence placing it under the borderline of leamability (lately averaged at 12 occurrences). This 
computation, I argued (2007), could reasonably be multiplied throughout the system. Thus is the 
plateau explained and an angle provided on the crisis of the high-paying but often failing foreign 
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language learner in US and UK universities. 

I thought this counting-up approach was a plausible, evidence-based, solutions-oriented and 
practitioner-usable explanation of some phenomena I and others had been working with in the 
academic reading classroom. But reading alone has its strong adherents, sometimes in 
unexpected places, and they were not slow to find problems in my simple explanation. All they 
have done, however, is tidy up some loose ends and make it stronger. 


The Critics 

McQuillan and Kras hen (2008) 

Quick to engage almost as soon as the paper came out were Jeff McQuillan and Stephen Krashen 
(2008). These researchers seemed to have a long list of studies ready to go where L2 readers got 
through vastly more text than Cobb (2007) had estimated, indeed up to a half million words a 
year, which in the case of conspiracy would have given it 12 hits and a potential anchor in the 
lexicon. The only problem, I responded after looking carefully at the data of the studies they 
were citing (Cobb, 2008), was that few instances of conspiracy or other words of the kind would 
be found in the reading in question, since these texts being read at such speed and in such 
quantity were in fact simplified texts where such words would not be found (in any significant 
number). So to my argument should be added the stipulation that sufficient vocabulary would not 
be learned from reading alone in the amount that can be read of texts where vocabulary growth 
would be possible. 

Nation (2014) 

Next onto the field, and with a change of venue from Language Learning & Technology to 
Reading in a Foreign Language, was Paul Nation (2014). I was inclined to look carefully at what 
Nation would want to add or subtract from my argument since I knew he supported a role for 
both direct and indirect instruction in vocabulary growth (unlike my first critics who have an axe 
to grind against any form of unnatural learning). Also Nation is a major player in both the 
coverage research already mentioned and the corpus frequency work on which my argument was 
based. Would Nation’s response force me to modify my position? 

The official brief of Nation’s study is to reconcile Cobb with McQuillan and Krashen but it 
seems more about convincing Cobb that there is somehow a case for the amount of student 
reading McQuillan and Krashen fantasize about, and yet of texts that would actually contain the 
learning content that had originally been at issue. How will Nation make this argument? I 
wondered. 

Like my study, Nation’s (2014) is a corpus investigation of the reading-alone question and seeks 
to compute how many running words of different corpora would be needed simply to contain a 
minimum of 12 instances of each of the most frequent 9,000 words of English (as shown in his 
2006 study to equate to 98% coverage). In other words, it is my study extended with some of the 
resources developed in the intervening seven years. Among his findings, for example, are that 
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812 of the fourth thousand word families would be met at least 12 times in a novel’s corpus of 
534,000 running words. This pretty much lines up with my conspiracy example above where six 
occurrences are found in 250,000 words, except Nation (very usefully) works the calculation up 
the scale into the zones where 98% coverage may lie. A summary of his findings, with rounding, 
is that a million words are needed to meet three-quarters of the fifth 1,000 word families 12 times, 
2 million for three-quarters of the seventh 1,000, and 3 million for the ninth-thousand. 

This is a welcome refinement, but why are we talking about ‘three-quarters’ of thousand lists 
being met 12 times? This is almost certainly due to the type of corpus used in the study, which 
consisted entirely of older English novels, which are known to be systematically low on 
‘academic’ or sub-technical vocabulary. For example, items from Coxhead’s (2000) Academic 
Word List comprise just 1.69% of Lawrence’s Lady Chatterley, but over 9% in a random 20,000 
words under the heading ‘technology’ from Wikipedia (as can easily be confirmed at 
www.lextutor.ca/vp/comp/). This is nothing new. Gardner (2004) details the ‘qualitative 
difference’ between the narrative v. expository lexis (in school texts, but there is no reason to 
think the finding is not general). All to say, there is room for yet another run of this study using a 
corpus that is not exclusively literary. 

But how does this interesting extension of my original finding get us any closer to a 
reconciliation with McQuillan and Krashen (2008)? Unfortunately this is only achieved with a 
little wishful thinking. At the beginning of the corpus part of the paper Nation warned he will be 
“temporarily putting] aside the vocabulary load issue” (p. 2) or, in other words, not considering 
whether or how fast learners would be able to read these texts, what density new words would 
appear in, etc. Then when he does return to the load issue in another part of the paper it is only to 
state that all these calculations would only have any practical bearing if we can assume that L2 
learners can learn 1,000 new word families a year, as young LI learners do: 

If we expect second language learners to increase their vocabulary at around the same yearly rate 
[of 1,000 families a year], then they will need to increase the amount they read each year, 
starting for the 2nd 1000 word level at under 200,000 tokens and rising to 3,000,000 tokens a 
year for the 9th 1000 level. This may be asking a lot, however, “as there is no published research 
to support this figure for learners of English as a foreign language” (Nation, 2014, p. 7). 
Nevertheless, “it is an optimistic goal to aim for” (p. 7). 

“No published research to support this figure” indeed, but there is research to support a different 
figure. Milton and Meara (1995) tallied learners’ vocabulary growth in periods abroad not at 
1,000 but at 550 words a year. Anyway, empirical data is not needed to calculate that even if 
1,000 word families per year was a truly realistic target in L2, then, at this rate, learning 9,000 
words would take (assuming learners left home with about 2,500) six years plus, or about what it 
takes in LI. 

I find the wishful reasoning of the cited passage, which is reiterated at the end, at odds with the 
number crunching elsewhere in the paper. But it has company. At another point in the same 
paper (p. 7) Nation mentions ‘speed reading’ as a way of making some of this extraordinary 
amount of learning-through-reading happen, despite having said in a different publication (Quinn, 
Nation & Millett, 2007) that in speed reading training materials, ’’There should be no or very 
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little unknown vocabulary” (p. 2). Nation presumably meant here that a learner who had once 
mastered speed reading techniques would then be able to speed read materials with significant 
amounts of new vocabulary as well, but as far as I know this has not been verified in a 
longitudinal study. 

An interesting extra with Nation’s (2014) paper is the announcement of a new set of graded 
readers developed with Laurence Anthony (2013) in the aim of systematically introducing 8,000 
word families to learners through a planned progression. However, no empirical data or even 
corpus figures are provided for how this controlled lexical buildup would work through the k- 
levels (how many hours and years etc.), nor is it entirely clear whether with so much pedagogical 
design incorporated into such texts this could really be called reading alone or natural reading. 

So, as for what I would modify in my original conclusion after reading this, it would be only that 
post-3000 will be hard going for reading alone unless you have six years for the job. 

McQuillan (2016) 

McQuillan’s (2016) paper followed on from both my original response to his paper with Krashen 
(2008) and from Nation’s (2014) paper just discussed. It seems to respond to my critique (that all 
the enormous quantities of L2 reading he and Krashen had cited had merely involved graded 
readers) inasmuch as the paper is entitled, ‘What can readers read after graded readers?’ The 
study is an uncharacteristic (for McQuillan) number crunching corpus study that starts with a 
description of Nation and Anthony’s (2013) advanced graded readers, which I thought sounded 
promising given that Nation had not really dealt with it, only to leam that McQuillan does not 
feel this is at all the way to go after graded readers. Instead, he has worked out at length in a sort 
of stacked young people’s story corpus that it is possible for learners to spontaneously come 
across naturally occurring learning sequences in typical unsimplified contemporary novels 
( Twilight, Harry Potter, which of course come at different naturally occurring vocabulary 
densities and compositions) that can take readers through the levels all the way to 9,000 word 
families (with 12 hits for new words, suitable ratios of known to unknown, etc.) via an enjoyable 
and thoroughly natural set of reading experiences. This would not involve any direct instruction: 
“It would be wrong to conclude based on the results of this study that adult L2 readers should 
test their vocabulary levels and attempt to ‘match’ themselves exactly to texts using a 95% or 
98% vocabulary coverage criterion” (p. 74). 

No, this can all be done without planning, calculation, or even a teacher’s help, but simply by 
readers self-selecting what they wish to read. Not quickly, however. Even learners who did 
discover the ideal sequence of reading experiences to get them all the way to Nation’s 9,000 
would need ”a little over three years of reading” (McQuillan, p. 65). 

Again, as in Nation’s (2014) study but to a greater extent, a mass of numbers about coverage and 
k-levels has been assembled only to vaguely background a take-home message about the power 
of reading alone that does not follow from them. Where is the evidence that any significant 
number of learners can self-select their path to complete lexical development over a three-year 
period? For this, McQuillan (2016) sends us off to read Nell (1988), which is a fine but irrelevant 
study disclosing the many benefits (does anyone doubt it?) of pleasurable fiction reading for 
native and near-native speakers but has nothing at all about the vocabulary knowledge of the 
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learners or the vocabulary challenge of their texts. It especially has nothing about whether the 
subjects were able to self-select incrementally challenging texts in any sort of progression. The 
word ‘vocabulary’ does not occur even once in the 40-page paper. 

Evidence bearing more directly on McQuillan’s (2016) claim can, however, be found. A study 
by McCrostie (2007) found that neither learners nor even their teachers could reliably predict the 
vocabulary learning burden of texts or even individual words. (That is why we use corpora and 
computer crunching to help with some of this.) And then there is the matter of the exclusive diet 
of literary fiction in both McQuillan’s (2016) and Nell’s (1988) studies, which as noted already 
does not represent the full lexicon. So, what I would add to my original fonnulation from a 
reading of McQuillan (2016) is that post-3,000 vocabulary pick-up will be hard going for reading 
alone unless you have three years for the job and can magically determine, unaided, an ideal 
sequence of incrementally challenging texts over a period of three years. 


The Modified Finding? 

In fact, I think I will not encumber my original finding with any farfetched stipulations about 
what could happen theoretically or in principle, or with unreal infusions of motivation or extra 
amounts of time that no real academic learner will ever be given-and leave the formulation as it 
was. L2 academic readers who begin their studies with 2500-3000 word families will need some 
help to get over the hump. It’s called teaching. 
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